Orientel-turkish: telephone speech database description and notes on the experience

نویسندگان

  • Tolga Çiloglu
  • Dinc Acar
  • Ahmet Tokatli
چکیده

OrienTel-Turkish includes telephone speech recordings and annotations of 1700 Turkish speakers balanced in gender, dialect, age and calling environment; approximately one third of calls are over the fixed network and the rest are over the mobile network. Each speaker contributes with 48 items containing digits, digit/number strings, time/date expressions, phonetically rich words and sentences, command words, and answers to spontaneous questions. The paper describes the contents of the completed database and presents notes on experience related to the preparation of the textual content, speaker recruitment, annotation, and error correction. SAMPA-Turkish has been created during the work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Orientel: recording telephone speech of turkish speakers in Germany

OrienTel is a project to create telephone speech databases for both the local and the business languages of the Mediterranean and the Arab Emirates. In Germany, 300 Turkish speakers speaking German were to be recorded. The database is an extension of the SpeechDat databases. This paper outlines the recording setup, the recruitment strategy and the annotation procedure. Recruiting the speakers w...

متن کامل

OrienTel – Arabic speech resources for the IT market

A survey of the language resources market clearly shows that the Arabic language is still a stepchild of international R&D efforts in the field of speech recognition. OrienTel for the first time makes an effort to create speech data on a large scale. It does so by profiting from the experience of previous SpeechDat projects and from the European Commission’s policy to embrace non-EU Mediterrane...

متن کامل

OrienTel - Telephony Databases Across Northern Africa and the Middle East

OrienTel is a project that over the past two-and-half years developed speech databases and phonetic standards across Northern Africa, the Middle East and the Arabian Gulf. The project is funded by the European Commission and is coordinated by ScanSoft (Germany and Belgium). Other partners are ELDA (France), IBM (Germany), NSC (Israel), Siemens (Germany), Lucent (UK), Knowledge, the University o...

متن کامل

OrienTel - Multilingual access to interactive communication services for the Mediterranean and the Middle East

OrienTel is a project funded within the European Commission’s IST framework that focuses on collecting linguistic data for telephony-based IT applications across the Mediterranean and the Middle East. Languages covered in this SpeechDat-based project are Cypriote Greek, Turkish, Hebrew, different varieties of Arabic, French, English and German. Within the project’s lifetime of 30 months, starti...

متن کامل

Orientel: speech-based interactive communication applications for the mediterranean and the middle east

In this paper, we introduce a new European project named OrienTel. The aim of OrienTel is to enable the project's participants to design and develop multilingual interactive communication services for the Mediterranean and the Middle East, ranging from Morocco in the West to the Gulf states in the East, including Turkey and Cyprus. These multilingual applications will be largely speech-based an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004